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Abstract 

This paper examines the estimation of two-stage clustered RCT designs in education research using the Nyman 
causal inference framework that underlies experiments. The key distinction between the considered causal models 
is whether potential treatment and control group outcomes are considered to be fixed for the study population (the 
finite-population model) or randomly selected from a vaguely-defined universe (the super-population model). 
Appropriate estimators are derived and discussed for each model. Using data from five large-scale clustered 
RCTs in the education area, the empirical analysis estimates impacts and their standard errors using the 
considered estimators. For all studies, the estimators yield identical findings concerning statistical significance. 
However, standard errors sometimes differ, suggesting that polity conclusions from RCTs could be sensitive to the 
choice of estimato r. Thus, a key recommendation is that analysts test the sensitivity of their impact findings using 
different estimation methods and cluster-level weighting schemes. 
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Foreword 



The National Center for Education Evaluation and Regional Assistance (NCEE) conducts 
unbiased large-scale evaluations of education programs and practices supported by federal funds; 
provides research-based technical assistance to educators and policymakers; and supports the 
synthesis and the widespread dissemination of the results of research and evaluation throughout 
the United States. 

In support of this mission, NCEE promotes methodological advancement in the field of education 
evaluation through investigations involving analyses using existing data sets and explorations of 
applications of new technical methods, including cost-effectiveness of alternative evaluation 
strategies. The results of these methodological investigations are published as commissioned, 
peer reviewed papers, under the series title, Technical Methods Reports, posted on the NCEE 
website at http://ies.ed.gov/ncee/pubs/. These reports are specifically designed for use by 
researchers, methodologists, and evaluation specialists. The reports address current 
methodological questions and offer guidance to resolving or advancing the application of high- 
quality evaluation methods in varying educational contexts. 

This NCEE Technical Methods paper serves to open up the “black box” of impact estimation for 
applied education researchers, and highlights both the importance of close attention to the 
estimation method and the importance of performing sensitivity tests using different estimation 
methods. Using the Neyman causal inference framework that underlies experiments, the report 
examines the estimation of impacts in two-stage clustered RCT designs. Several causal models 
are considered. The key distinction between these models is whether potential treatment and 
control group outcomes are considered to be fixed for the study population (the finite -population 
model) or randomly selected from a vaguely-defined universe (the super-population model). 
Appropriate estimators are derived and discussed for each model, highlighting the differences in 
underlying assumptions among them. Using data from five large-scale clustered education RCTs, 
the empirical analysis estimates impacts and their standard errors using the considered estimators 
to assess whether impact findings are sensitive to the use of different estimation methods and 
cluster-level weighting schemes each employs. 



Foreword 



v 





Contents 



Chapter 1: Introduction 1 

Chapter 2: The Neyman Causal Inference Model 3 

The Neyman Finite-Population Model for Two-Stage Clustered Designs 3 

The Super-Population Model for Two-Stage Clustered Designs 5 

Chapter 3: ATE Parameter Estimation for the Finite-Population Model 7 

Finite-Population Model Without Covariates 7 

Finite-Population Model with Covariates 9 

Chapter 4: ATE Parameter Estimation for the Super-Population Model 13 

Super-Population Model Without Covariates 13 

Super-Population Model With Covariates 15 

Chapter 5: Variance Component Estimation for the Super-Population Model 17 

Balanced Design Estimator 17 

ANOVA Estimator 18 

Maximum Likelihood Estimator 19 

Restricted Maximum Likelihood Estimator 20 

GEE Estimator 21 

Chapter 6: Empirical Analysis 23 

Weights for the Finite-Population and Super-Population Models 23 

Impact Findings 25 

Chapter 7: Summary and Conclusions 29 

Appendix A A-l 

Appendix B B-l 

References R-l 



Contents vii 





List of Tables 



Table 3.1: Routines in the Considered Statistical Packages for Estimating ATE Parameters and 

Their Standard Errors, by Model 10 

Table 6. 1 : Information on Weighting Schemes for the FP and SP Models, by Study 25 

Table 6.2: Regression-Adjusted Impact Results, by Study 26 



List of Tables 



IX 





Chapter 1: Introduction 



In randomized control trials (RCTs) of educational interventions, random assignment is often performed 
at the school or classroom level rather than at the student level. These group-based designs are common, 
because RCTs in the education field often test interventions that provide enhanced services to teachers 
(for example, training in a new reading or math curriculum or mentoring services) or that test 
interventions that affect the entire school (for example, a school- wide social and character development 
program or re-structuring initiative). Thus, for these types of interventions, it is infeasible to randomly 
assign the treatment directly to students. 

Under these group-based designs, data are typically collected on students. Thus, using student-level data, 
the statistical procedures that are used to estimate average treatment effects (ATEs) and their standard 
errors must account for the potential correlation of the outcomes of students within the same groups. In 
particular, the standard errors of the ATE estimators must be inflated to account for design effects due to 
clustering. 

Over the past 40 years, a huge statistical literature across multiple disciplines discusses the estimation of 
treatment effects under two-stage clustered designs (see, for example, Rao 1972, Harville 1977, Laird and 
Ware 1982, Hsiao 1986, Liang and Zeger 1986, Baltagi and Chang 1994, Murray 1998, Raudenbush and 
Bryk 2002, Wooldridge 2002, and De Leeuw and Meijer 2008). These models have a number of labels, 
including random effects models, random coefficient models, one-way models, variance components 
models, panel models, hierarchical linear models (HLM), and linear mixed models. A number of 
statistical packages have been developed to estimate these models using analysis of variance (ANOVA), 
maximum likelihood (ML), restricted ML (REML), generalized estimation equation (GEE), and other 
methods. 

This paper contributes to this literature by discussing the estimation and interpretation of the ATE 
parameter under clustered RCTs using the non-parametric model of causal inference that underlies 
experimental designs. Thi s model was introduced for non-clustered designs by Neyman (1923) and later 
developed in Rubin (1974, 1977) and Holland (1986). This article extends this theory to two-stage 
clustered RCTs, and develops regression equations that are consistent with this theory. The analysis 
focuses on continuous outcomes (such as test scores), and discusses relevant ATE parameters assuming 
that the outcome data are either (1) fixed for the study population (a finite-population model) or (2) 
random draws from population outcome distributions (the more common super-population model). 
Appropriate estimation methods and asymptotic moments are discussed for each model, and the methods 
are li nk ed to the following commonly-used statistical packages: SAS, STATA, R, SUDAAN, and HLM. 
The paper considers both simple differences-in-means models and those that include baseline covariates. 

Finally, ATEs and their standard errors are estimated using the considered methods using data from five 
recent large-scale clustered RCTs in the education area. The purpose of this analysis is to examine the 
robustness of study findings to alternative estimation approaches. This is important, because education 
researchers typically employ statistical packages and estimation routines with which they are most 
comfortable, and published articles in the evaluation literature rarely report impact results using 
alternative estimation schemes. Thus, this article can provide information to education researchers about 
the assumptions underlying commonly-used ATE estimation methods, how these methods work, and the 
sensitivity of impact findings to alternative estimation strategies. The goal is not to identify the best 
methods, but to discuss options and interpretation. 
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The rest of this paper is in six chapters. Chapter 2 discusses the Neyman causal inference model, and 
Chapters 3 and 4 discuss the estimation of the ATE parameter under the finite- and super-population 
models, respectively. Chapter 5 discusses methods for estimating variance components for the super- 
population model, and Chapter 6 presents findings from the empirical analysis. The final chapter presents 
a summary and conclusions. 
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Chapter 2: The Neyman Causal Inference Model 



This chapter discusses the Neyman finite-population (FP) and super-population (SP) causal inference 
models under two-stage clustered designs — the most common designs used in education RCTs. The focus 
is on continuous outcomes. The theory is then used to derive regression equations for estimating the ATE 
parameters. 



The Neyman Finite-Population Model for Two-Stage Clustered Designs 

Consider an experimental design where n schools (or classrooms) are randomly assigned to either a 
single treatment or control condition. The sample contains np treatment and n(\ — p) control group 
schools where p is the sampling rate to the treatment group (0 < p < 1 ) . It is assumed that the sample 

contains m i students from school i and that there are M — ^ m i total students in the sample. It is 
assumed that student outcomes are not affected by the treatment status of other students. 

It is assumed for now that the n schools and M students define the population universe — the FP model 
considered by Neyman for non-clustered designs. Let Y nj be the “potential” outcome for student j in 

school i in the treatment condition and Y Cij be the potential outcome for the student in the control 
condition. The difference between the two fixed potential outcomes, (Y Tij — Y aj ) , is the student-level 
treatment effect, and the ATE parameter, f5 x , is the average treatment effect over all students: 

_ _ 1 n ntj 

(1) ft = r r -r c = --£'£(Y Tt -r ai ). 

M /=1 j = i 

This ATE parameter cannot be calculated directly because potential outcomes for each student cannot be 
observed in both the treatment and control conditions. Formally, if 7) is a treatment status indicator 
variable that equals 1 for treatment schools and 0 for control schools, then the observed outcome for a 
student, y.. , can be expressed as follows: 

(2) >v = 7;x,,+(l Ti )Y (y . 

Importantly, the potential outcomes in (2) are fixed and the only source of randomness is T. . Thus, under 

the Neyman model, the ATE parameter pertains only to those students and schools at the time the study 
was conducted. Stated differently, the impact findings have internal validity but do not necessarily 
generalize beyond the study sample. This approach can be justified on the grounds that schools are 
usually purposively selected for education RCTs, and thus, may be a self-selected sample of schools that 
are willing to participate, and that are deemed to be suitable for the study based on their student and 
teacher populations and typical service offerings. Similarly, students in the study sample may not be 
representative of all students in the study schools, because they could be a potentially nonrandom subset 
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of students whose parents consented to participate in the study, who provided follow-up data, and who 
did not leave the study schools between baseline and followup. 1 

Under this fixed population scenario, researchers are to be agnostic about whether the study results have 
external validity. Policymakers and other users of the study results can decide whether the impact 
evidence is sufficient to adopt the intervention on a broader scale, perhaps by examining the similarity of 
the observable characteristics of schools and students in the study samples to their own contexts, and 
using results from subgroup and implementation analyses. 

Following the approach for non-clustered designs used by Freedman (2008) and Schochet (2009), a 
regression model for (2) can be constructed by re-writing (2) as follows: 

(3) y ij =Po+Pi(T i -p) + Ji iJ , where 

• J3 0 = pY r + (1 - p)Y( and (3 l =Y T - Y c are parameters to be estimated 

• rjy = CC] +T j (T i -p) is an “error” term, where a ;/ = p(Y Tlj — Y r ) + (1 — p)(Y Cjj — Y c ) and 
T y =(Y Ty -Y r )-(Y qi -Y c y 

The error term ij t , is a function of two terms: (1) a t/ , the expected observed outcome for the student 
relative to the expected mean observed outcome; and (2) , the student-level treatment effect relative to 

the ATE. Note that a tj and zv sum to zero over all students. This model is non-parametric because it 
does not depend on the distributions of the potential outcomes. 

The model in (3) does not satisfy key assumptions of the usual random effects model, because r/.. does 
not have mean zero (over all possible treatment assignment configurations), and, to the extent that r (/ 
varies across students, rj .. is heteroscedastic, Cov{fj j rj jJ ,) is not constant for students in the same schools, 
Cov{rj ij ij i , j ,) is nonzero for students in different schools (for i T i',j ^ j ' ), and // (/ is correlated with the 
regressor (T t — p ) : 

ij) = a ij . Variriy) = Typ(l - p), Covip^j,) = TyT if p{ 1 - p), 

COVifl jjTJi'j' ) = -TyT rf p(l-p)/(n-l), E [ ( T t P j ) 1J yy ] = TyP(l~p). 

Note that in this model, the error terms for students within the same schools are correlated only because 
they have the same treatment status, not because they face similar environments. 



'For cost reasons, in education RCTs, follow-up data are not usually collected for students in the baseline 
sample who leave the study districts. 

‘In (3), the term ( T-p) is used rather than 7) because it simplifies the mathematical proofs presented later in this 
paper, but this centering has no effect on the findings. 
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Importantly, the model in (3) should not be confused with a fixed effects model, where cluster effects are 
treated as fixed, and cluster-level dummy variables are included in the model. Rather, the model treats 
cluster-level effects as random due to the randomness of treatment status in the model error term. 

Finally, (3) implicitly assumes that schools are weighted by their student sample sizes. An alternative 
specification is to weight schools equally. In this case, the ATE parameter is /?, =Y T —Y c , where 

% = a / «)E" =1 E?=i {y 'v ' m i ) and f c=d / Z-1 ( Y Cj / m i ) are averages of school-level 

means. This ATE parameter pertains to the average school effect in the sample rather than to the average 
student effect. This weighting scheme will result in different impact estimates than the unweighted 
analysis if student sample sizes vary across schools and impacts vary by school sample size. 



The Super-Population Model for Two-Stage Clustered Designs 

We now consider a SP version of the Neyman causal inference model where the study schools and 
students are assumed to be random samples from broader populations (see Imbens and Rubin 2007 and 
Schochet 2008, 2009). This framework is typically used to estimate impacts under clustered RCTs in the 
education area, and is consistent with popular linear mixed model approaches, such as HLM. 

Under this framework, students are nested within schools. Let Z Ti be the potential outcome (mean 
posttest score) for school i in the treatment condition and Z a be the potential outcome for school i in the 

control condition. Potential outcomes for the n study schools are assumed to be random draws from 
potential treatment and control outcome distributions in the study super-population. It is assumed that 

means and variances of these distributions are finite and denoted by fl T and <J uT for potential treatment 

outcomes and // c and <J uC for potential control outcomes. These two outcome distributions also define 

the distribution of school-level treatment effects in the super-population, which are assumed to have mean 

2 

Ll r and variance CT . 



Suppose next that m i students are sampled from the student super-population in study school i. The 
potential student-level outcomes Y TiJ and Y <* are now assumed to be random draws from student-level 
potential outcome distributions (which are conditional on school-level potential outcomes) with 
respective means Z Ti and Z a and respective variances <j) t > 0 and <j] c > 0 . 

Under the SP model, the ATE parameter is jU T = E(Z Ti — Z G ) = fi T —JU C . Thus, the impact findings are 

now assumed to generalize to the super-population of schools that are “similar” to the study schools. How 
should one interpret this super-population? Does it pertain to the study schools over the “long term” for a 
broader universe of students and school staff that change over time? Does it pertain to a broader set of 
schools in the study districts? To similar schools nationwide? The answers to these questions will likely 
depend on the context (and may not exist), but researchers should be aware that the usual approach for 
estimating treatment effects in education research makes the implicit assumption of external validity to a 
school universe that is likely to be vaguely defined. Nonetheless, this approach can be justified on the 
grounds that policymakers may generalize the findings anyway, especially if the study provides a primary 
basis for deciding whether to implement the tested interventions more broadly. Furthermore, this 
approach is more consistent with the Bayesian view that assessing intervention effects is a dynamic 
process that takes place in a context of continuously increasing knowledge. 
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As before, we can use (2) to express observed student outcomes in terms of potential outcomes, and can 
rearrange terms to yield the following regression model: 



(4) y y =a 0 + a t T + (u i + e.. ) , where 



a 0 = /J c and a } = /lI t — ju ( (the ATE parameter) are coefficients to be estimated 

u i =T i (Z Ti -/i T ) + (\-T i )(Z a -ju c )is a school-level error term where A(w) = 0, 
E(T i u i ) = 0, Var(u j \T i =\) = a] T , and Var{u i T ; = 0) = cr c 

e ij = L(Y nj - Z Tj ) + (1 - T i )(Y aj - Z a ) is a student-level error term where E(e i/ ) = 0 , 
EiT^j) = Eiufiy) = 0 , Far(e.. | T t = 1) = cr r , and Var(e.. \ T t = 0) = cr; c . 

Furthermore, if we define S u = u : + e, as the total error term: 

v 1 y 



Var(Sy | T = 1) = cj] iT + a ] T , Var( ( E | T t = 0) = a 2 uC + cr e 2 c , Cov(^ , ) = 0 , 

Cov(Sy , 5y, \T i =\) = cx 1 uT , Cov(8y , 5y, \T i =G) = (j 1 uC . 



Thus, this model is the usual random effects model with an exchangeable block diagonal variance- 
covariance matrix for the error vector except that variances and covariances are allowed to differ for 
treatments and controls. 

Finally, note that (4) can also be derived using the following two-level FILM model (Bryk and 
Raudenbush, 1992): 

Level 1 : y g = z,. + e y 
Level 2: z t - a 0 + a l T i + u t , 

where z ( . = TZ n + (1 — T i )Z a is the observed school-level outcome, Level 1 corresponds to students, and 

Level 2 to units. Inserting the Level 2 equation into the Level 1 equation yields (4). Thus, the HLM 
approach is consistent with the SP causal inference theory. 
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Chapter 3: ATE Parameter Estimation for the Finite-Population 
Model 

This chapter discusses ATE parameter and variance estimation for the FP model with and without 
baseline covariates. Mathematical proofs of asymptotic results are provided in the appendix. It is assumed 
for the remainder of this article that sample sizes of clusters are large enough so that asymptotic results 
are approximately valid (see Bingenheimer and Raudenbush, 2004 for a discussion of this issue). 



Finite-Population Model Without Covariates 

Ordinary least squares (OLS) methods are appropriate for estimating J3 X in (3), because the ATE 
parameter for the FP model pertains to the study sample only. The following lemma provides the 
asymptotic moments of the OLS estimator. 

Lemma 1 . The simple OLS estimator for J3 X under the FP model in (3) is /), SR = (y T — y c ) , where y T 
and y c are (unweighted) sample means for the treatment and control groups, respectively. As n increases 
to infinity for an increasing sequence of finite populations, f5 x SR is asymptotically unbiased. Furthermore, 
assume that: 



n 1 n m i m, _ _ 

(5) m i / n—> m, 

i=\ nm i = 1 j = 1 k = 1 

i n m j nij i n m, m i 

- Z L I 0 y ai - f X r m - r c ) -> s } , and — £ £ £ : V* T > 



nm i= \ j = i / c= i 



ru n , =1 ,- = 



i = 1 j=\ k = 1 



where m, S R , 5 ( 2 , and , S? are fixed, nonnegative, real numbers. Then, /), SR is asymptotically normal 
with variance: 

(6) AsyVar{(3 lSR ) = + =^ c - 4 ■ 

nmp nm{ I - p) nm 



The Sj and S f 2 terms pertain to the extent to which potential outcomes vary and co-vary across students 
within the same schools. The S 2 term pertains to the extent to which treatment effects vary and co-vary 
across students within schools. Note that if student-level treatment effects are constant, S 2 = 0 and 
5f=5 c 2 . 



With heterogeneous treatment effects, it is difficult to find a consistent estimator for S 2 , because this 

requires unobserved information on student-level treatment effects. Flowever, because S ~ > 0 , ignoring 

this term will provide conservative variance estimators. Following this approach, a consistent estimator 
for the first two terms on the right-hand side in (6) can be obtained using the population averaged 
generalized estimating equation (GEE) approach developed by Liang and Zeger (1986) for clustered data 
(see also Flardin and FTilbe 2003). 
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To describe this method for general applications, it is assumed that x.. is a row vector of model baseline 
covariates (including the intercept and T t —p), jq is an m f X 1 column vector of student outcomes, and V; 
is the assumed (“working”) m i xm i covariance structure for y 8 . The GEE method for estimating the 
vector of regression parameters P solves the following equation for the score function S(P): 

(7) S(P) = £^V(y. H,(P)) = 0, 

m <9p 

where jlj(P) is the expected value of V, that is li nk ed to a linear combination of the covariates through a 
monotonic differentiable link function g where g ( // ;/ ) = and jU y —g 1 (XjjP) . 

Equation (7) can be solved iteratively using a Taylor series expansion of S(P) around S(P). Under this 

approach, the estimated parameter vector p (1,er+1) at iteration (iter + 1) can be updated from p (lter) as 
follows: 

pOfer+l) = p (iter) + ? where 

(8) I, = -£(3S(P)/3P) = V '' V' tE 

,=l Ep 5p 

is the information matrix. The matrix I 0 is sometimes replaced by J 0 = <3S(P)/<9P (Binder 1983). 

The model-based variance estimator of the solution P is I 0 ' . The empirical or robust “sandwich” 
variance estimator uses the data to correct for the potential misspecification of Vj and equals I ^ I , T 0 ' 
where 



(?) 

M ^P ^P 



and i*j = (y j M^i(P)) is an M'-X 1 vector of regression residuals. 



In our application, we assume (1) an independent working correlation structure (that is, V; is the identity 
matrix), (2) an identity li nk function ( p.. = J3 0 + (3 l (T i — p)), and (3) the empirical sandwich variance 

estimator. The ATE estimator for this linear model is then /), GEE = (y T — y c ) with the following 
asymptotic variance estimator: 



n nil rrij 



(10) AsyVa,iP ws ) = —YL'LW ~ <" r r 

a i= i j= i k=\ 



| n nij nij 

t ‘‘ l ~ (,iSf p(l- 
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where d - ^ ^ (T i - p ) z . This variance estimator is based on the sums of products and cross-products 

i=l y=l 

of OLS residuals for students within the same schools. Table 3.1 displays statistical package routines that 
use this method. 

If schools are to be weighted equally under unbalanced designs, the GEE method can be applied by first 
pre-multiplying the outcome and explanatory variables (including the intercept) by the weights Jwy 

where w.. oc 1/ m i (Pfeffermann et al. 1998). Under this approach, it may be reasonable to also weight 
each school district equally if random assignment is conducted within school districts. 

Importantly, as discussed in Murray (1998), the GEE method should be used only if the number of 
clusters in each research condition is at least 20. For smaller sample sizes, simulations demonstrate that 
the Type I error rate may not be close to the nominal level. 

Finally, for the equal-school weighting scheme, model- free permutation (randomization) tests can also be 
used to test the strong null hypothesis that all student-level treatment effects are zero (Gail et al., 1996). 
Under this approach, observed school means are used to construct the distribution of all possible 
treatment effects under the null hypothesis of no impacts. This is done by (1) allocating schools to all 
possible combinations of np “pseudo-treatment” schools and n(l—p) “pseudo-control” schools, (2) 

estimating a treatment effect for each of the \n\! np\n{\ — /t>) !] allocations, (3) sorting these treatment 
effects from smallest to largest, (4) observing where in the distribution the treatment effect for the actual 
treatment-control allocation lies, and (5) rejecting the null hypothesis if the actual lies outside the 
a/2 or 1 — {a IT) quantiles of the permutation distribution (which will have mean 0). 3 The validity of 
this method does not rely on a model, but only on correct randomization. 

Gail et al. (1996) demonstrate through simulations that Type I error rates of these tests are near nominal 
levels if n is moderate, p is near 0.5, and variances of the outcomes do not differ substantially across 
the treatment and control conditions. These conditions are likely to hold in practice. Furthermore, Gail et 
al. (1996) demonstrate that the procedure performs better using school-level residuals from regression 
models that include baseline covariates (see below). 



Finite-Population Model with Covariates 

We now examine ATE estimators when the FP models include fixed covariates, q.j , pertaining to the 

pre-randomization period. The covariates are not indexed by T or C because their values are independent 
of treatment status due to randomization. The covariates could include both school-level covariates and 
student-level covariates that are centered at school-level means. All covariates are assumed to be centered 
at grand means. 



Tor moderate n (say, «>30), the number of possible allocations becomes very large. In these cases, the 
permutation distribution can be estimated from a large random sample of reallocations of school means to the 
pseudo-treatment and control groups. 
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Table 3.1: Routines in the Considered Statistical Packages for Estimating ATE Parameters and Their 

Standard Errors, by Model 



Estimation 

Method 


Variance- 
Related 
Formulas in 
Text 


Statistical Packages and 
Routines 


Notes on Estimation and 
Specification 


Finite-Population Model 


GEE 


(13) 


Sudaan: Regress 
SAS: Proc Genmod 
Stata: xtgee or regress 
vce(cluster) command 
R: gee or glm function 


An independent working correlation 
structure must be specified to obtain 
OLS parameter estimates. The empirical 
sandwich estimator should be specified. 
The Zeger or Binder optimization 
method can be specified in most 
packages. 


Permutation 


NA 


None 


Used for hypothesis testing using 
school-level means or regression 
residuals 


Super-Population Model 


Balanced Design 


(22) 


Sudaan: Regress 
SAS: Proc Reg or GLM 
Stata: regress command 
R: Im function 


Parameter and standard errors are 
obtained by applying OLS to the 
between-school regression model in 
(21). 


ANOVA 


(24), (25) 


SAS: Proc Panel 
Stata: xtreg sa 


Variance component estimates in (24) 
and (25) are inserted into (16) and (17) 
to obtain feasible GLS estimates 


ML 


(28)-(32) 


SAS: Proc Mixed 
Stata: xtmixed command 
R: Ime package; 

HLM2, HLM3, HMLM2 


Yields feasible GLS estimates. 

Statistical packages use different 
defaults for using ML or REML, and for 
using Newton-Raphson, Fisher-Scoring 
or the EM algorithm for optimization. 


REML 


See (33) 


Same as for ML 


Same as for ML 


GEE 


(34), (35) 


Same as for GEE above 


An exchangeable working correlation 
structure must be specified; yields 
feasible GLS estimates using the model- 
based or empirical sandwich variance 
estimators. The Zeger or Binder 
optimization method can be specified in 
most packages. 



NA: Not applicable. 
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In the Neyman model, the covariates are irrelevant variables because (3) is the true model. Thus, the ATE 
parameters considered above without covariates pertain also to the models with covariates. 

To examine asymptotic moments of the OLS estimator under the FP model with fixed covariates, we 
assume in addition to (5) that as n approaches infinity: 



n y^i m. 



(ID 



EEZV* 

i=l 7=1 k=l 



n trij mi 






n m i m i 






^5 



nm 



af- 



;=l i=l k t > Si, and ;=1 J=1 ^ 

nm nm 



-> S 



v 



where f.. is the student’s predicted value from a full-sample OLS regression of a.. on q :j ; h tj is the 

predicted value from a full-sample OLS regression of r (/ on q^ , and S ^ tf , Si , and S hf are fixed, 

nonnegative real numbers. The following lemma generalizes results in Schochet (2009) and Freedman 
(2008) to two-stage clustered designs. The proof is provided in the appendix. 

Lemma 2. Let (5 X MR be the multiple regression estimator for [5 X under the model in (3) and assume (5) 
and (11). Then, j3 x m is asymptotically normal with mean fS { and variance: 



(12) AsyVar(P ) = 



f S? 



ynmp nm(l-p) 



T2 \ 



nm 



1 



nmp( 1 - p) 



(25 



af 



-s 



ff 



2(1-2 p)Sl f ). 



The first bracketed term in (12) is the variance of the OLS estimator under the FP model without 
covariates. The (2 S a j- — S~ ) term is a generalized version of the usual explained sum of squares from a 
multiple OLS regression, and will typically generate precision gains if the covariates are correlated with 
potential outcomes. The 2(1 — 2p)Sf lf term pertains to regression-adjusted covariances between or. and 

Tj for students within the same school. This term will be zero if p = 0.5 or if the covariances between 

potential outcomes are similar in the treatment and control conditions (which would occur, for example, 
with constant treatment effects); otherwise this term could have any sign. 

A variance estimator for (12) can be obtained using the GEE approach discussed above. Let 
X i =(KT i Q i ) , where K is an ntx\ column of 1 s for the intercept, Tl is an ntx\ vector containing 
the T f —p terms, and Q; is a matrix of covariates for school i . In this case, a variance estimator is: 



(13) AsyVar(P lMR ) 



(Xx;x i r 1 (Xx;r i r i 'x i )(Xx;x i r 1 



U 2,2) 



where the residuals r : are calculated from a full-sample OLS regression of jq on Xj. The permutation 
tests discussed above could also be used for significance testing using the school-level residuals r t (for 
the equal-school weighting scheme). 
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Chapter 4: ATE Parameter Estimation For The Super-Population 
Model 



This chapter examines ATE parameter estimation for the SP model with and without baseline covariates, 
where it is assumed that error variances are the same in the treatment and control conditions: 

2 2 2 2 2 2 

<J uT = a uC = <J U and cr eT = a eC = cr) . This assumption is commonly applied and greatly simplifies the 
presentation. 

This chapter focuses on generalized least squares (GLS) methods that are typically used to provide 
consistent and efficient estimators for a x in (4). However, the chapter starts with a discussion of the OLS 
approach (which produces consistent, but inefficient estimates) so the SP and FP estimators can be 
compared using a common approach. Methods for estimating variance components to obtain feasible GLS 
estimates are discussed in Chapter 5. 



Super-Population Model Without Covariates 

The SP model in (4) for students in school i can be expressed in vector notation as follows: 

(14) Yi =« 0 +«jTj +5j, 

where ft) = is(8 i 5j) is an m i xm i positive definite variance-covariance with diagonal terms O) + cr 2 and 
off-diagonal terms cr 2 . The estimation of this model using OLS and GLS methods is discussed next. 



OLS Methods 

Standard methods (see, for example, Schochet 2008) can be used to show that as n increases to infinity, 
the OLS estimator d x SR = (y T — y c ) is asymptotically normal with mean a x and asymptotic variance 
that can be estimated as follows: 



(15) AsyVar(d x SR ) - 



1 



p{\-p) 



TL 



2 ~2 

m. <j„ 



- + - 



<Z>T Z 



m. 

i=i ' J 



A 2 A 2 . 2 2 

where cr,) and cr) are estimators for cr) and (7 e , respectively. Note that this variance is minimized if 
p = 0.5 and m, = m for all schools (that is, for balanced designs). 



The term in parentheses in (15) can be computed by summing the elements of Qj across schools and 
dividing by M 2 , where O; is an estimator for . Thus, (15) is comparable to the .S’-) and S ( 2 terms 
in (6) for the FP model. Thus, an important difference between the SP and FP models is that unlike the SP 
model, the FP model contains S 2 , which reduces variance. Thus, in theory, the variance may be 
somewhat smaller under the FP model, which is expected, because the SP model assumes external 



ATE Parameter Estimation For The Super-Population Model 



13 





validity, with an associated loss in statistical precision. However, as noted, it is difficult to estimate S 2 
for clustered designs; thus, precision gains for the FP model cannot typically be realized in practice. 



GLS Methods 

Consider a generic regression model where the covariate and variance matrices for school i are denoted 
by Xj and fij , respectively. The feasible GLS estimator of the parameter vector 01 is then: 

(16) « GLS = (£;, x^-x.r'^xw'y,), 

where ; is an estimator for Lij . 



In our case X s = [KTJ , so (16) reduces to 



a 



Z 



1 ,GLS 



np 

^ np 



Z 



n(l~P ) 

i:T; =0 



w,y< 



T np w, X 

/—U:T,= 1 ' /—I 



n(l-p) 
i:T, =0 



W, 



where y t is the mean outcome in school i and w i = [oy + (a 2 / m i )] 1 is the associated school-level 

weight. This is a weighted differences-in-means estimator, where the weights are inverses of the variances 
of school-level means. 

The weights can also be expressed as vv ( = [ICC + {(1 - ICC) / m i } ] 1 where ICC = a 2 /(a 2 + a 2 ) is 

the estimated intraclass correlation coefficient. The first ICC term inside the brackets is common to all 
schools. Thus, the weights differ due to the second term. Schools with smaller variances (more sampled 
students) receive more weight in the analysis than schools with larger variances (fewer sampled students), 

because the larger schools provide more information on the super-population parameters p T and Ll ( . As 

ICC approaches zero, the SP weights converge to the FP weights where schools are weighted by their 
sample sizes. Conversely, as ICC approaches one, the SP weights converge to the FP weights where 
schools are weighted equally. Under the SP approach, it may be reasonable to weight each school district 
by the size of their school population if random assignment is conducted within school districts. 

It is well known that under weak regularity conditions, the feasible GLS estimator is asymptotically 
normal with mean a and variance / n (see, for example, Wooldridge 2002). This 

variance can be estimated as follows: 

( 17 ) 

which in our case reduces to 



(18) AsyVar(a lGLS ) = 



(i-A z: 



i 

=1 +(<t:/"u) 



+ FZI: 



n{\-p) 
T: —1 
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<x; +(<r e 2 /m ; ) 



-i-i 
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For known , the GLS estimator is the best linear unbiased estimator (BLUE) (although this may not 
hold if fij is replaced by ). The ANOVA, ML, REML, and GEE approaches discussed in Chapter 5 
yield feasible GLS estimators where estimators for <J U and a e are inserted into (16) and (17). 



For a given sample size, the variance in (18) is minimized when m i = m and p = 0.5. Furthermore, if 
m i = m , the OLS and GLS estimators of a x are identical and yield the following simple variance 
estimator: 



(19) AsyVar(a l 



Balanced 



) = 



1 



p(l-p) 



f CT U cr 
— + 



2 h 



ft 



nm 



Note that replacing m by m in (19) is a serviceable variance estimator for designs where sample sizes 
vary somewhat across schools, which can be seen by setting m i = m in (18). 



Super-Population Model With Covariates 

Under the SP model with covariates, the covariates q- as well as the potential outcomes are considered to 
be random draws from joint super-population distributions. For the estimation model, the covariate matrix 
is now Xj = [K Tj QJ and ft; is now conditional on Qj . In principle, the covariates should be 
considered irrelevant variables because (14) is the true model. This procedure, however, considerably 
complicates the asymptotics for the GLS estimator, because Qj will tend to be correlated with the error 

term, and 12 ( will differ from the true 12 j . 4 

Consequently, the following analysis strays somewhat from the Neyman framework and assumes that the 
true model contains Qj . In this case, the GLS formulas in (16) and (17) also apply to the SP model with 
co variates. 



4 For the OLS estimator, the first problem can be overcome (as it was for the FP model) and the second problem 
does not occur. The asymptotic variance of the OLS estimator is similar in form to that for the FP model in (12) but 

does not include terms comparable to S (not shown). 
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Chapter 5: Variance Component Estimation for the Super- 
Population Model 



Feasible GLS estimation requires estimates of the variance components (J 2 and G 2 . This chapter 

discusses key features of ANOVA, ML, REML, and GEE estimation methods that can be used to estimate 
these variance components and that are used in the empirical analysis. To keep the presentation 
manageable, the discussion does not focus on other methods, such as bootstrap, jackknife, and other 
resampling methods. De Leeuw and Meijer (2008) provide an excellent, more detailed discussion of GLS 
estimators for multilevel models. 

For simplicity of exposition, in what follows, let the symbol O j represent a generic covariance matrix for 
school i, Xj represent a generic covariate matrix for a school, and S :J = u i + e tj represent a generic 
normally distributed error for a student. 



Balanced Design Estimator 

When m i = m for all schools — that is, for balanced designs — a consistent variance estimator for the 
simple differences-in-means estimator has the following simple form: 

(20) AsyVar(a Balanced ) = ^ , 

np(\-p) 

where 

np n(\-p) 

X(T, -Tr) 2 + Z (T, -Tc ) 2 

r.2 _ r-T,=\ i:T,=0 

^ B ~ Z 

n — 2 

is the variance of the mean outcome between schools (see, for example, Cochrane 1963). This estimator is 
consistent because E{S 2 B ) = a 2 + (a 2 / m) (see (19)). 

If covariates are included in the model, (20) can be generalized using the following between-school 
regression model: 

(2 1) y. =a 0 + a x T t + qja, + S f , 

where y f is the school-level mean, q; is a k x x\ vector of school-level covariates (that could include 
student - level covariates averaged to the school level), and 8 i = (u j + e t ) is the school-level error term. 
Estimating (21) by OLS yields the following variance estimator for a x : 

(22) AsyVar(a l BaIanced ) = (X'X)~] 2 RSS B /(n -k x - 2), 
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where X=[KTQ] is the covariate matrix and RSS B is the regression residual sum of squares. 

For balanced designs, a x Ba i anced is an ANOVA or REML estimator and is minimum variance unbiased 

under normality of the error terms (Searle 1971). This estimator, however, has no optimal properties for 
unbalanced designs. Nonetheless, it is appealing due to its simplicity, because it is based entirely on the 
between-school OLS regression, and produces serviceable estimates for designs that are not too highly 
unbalanced (which is typically the case in practice). As discussed next, estimating variance components to 
account for unbalanced designs becomes considerably more complex. 



ANOVA Estimator 

The ANOVA estimator is a method-of-moments estimator that equates regression residual sums of 
squares to their unobserved expectations and solves these equations to obtain estimators for the variance 
components. ANOVA methods have the advantage that the variance components can be obtained in one 
step using easily-understood OLS regression residuals, rather than iteratively, as is the case for the ML, 
REML, and GEE methods. The disadvantage of the ANOVA methods is that for unbalanced designs, they 
have no optimal properties beyond asymptotic unbiasedness. 

This section discusses the Swamy and Arora (SA; 1972) ANOVA method that was adapted for 
unbalanced designs by Baltagi and Chang (1994). De Leeuw and Meijer (2008) and Baltagi and Chang 
(1994) discuss alternative ANOVA estimators that are similar to the SA method. 

Under the SA method, an estimator for the student-level variance, <J e , is obtained by first estimating a 
within-school OLS regression: 

(23) (y g -y i ) = (q u -q,)r + (e g -e t ). 



This yields the following consistent variance estimator for cr( 



(24) *l JNOn =RSS w /(£’m,-n-k 2 ), 



where RSS XV is the regression residual sum of squares from (23) and k , is the number of student-level 
(within school) covariates. 

To obtain an estimator for cr 2 , the SA method uses the residual sum of squares RSS B from the between- 
school regression in (21) where schools are weighted by their sample sizes. In this case, 

RSS b = 5'W5 = 5'B'WB5, where W is an nxn diagonal weight matrix with weights tn i along the 

diagonal, and B = [I M - X(X'WX)' 1 X'W]. Using matrix algebra, it can be shown that: 

E(RSS b ) = £(fr[S'B'WB5]) = cr 2 fr[B'WB] + cr^B'WBW' 1 ] 

= (X J =1 m i ~ M(X'WX)- 1 X'WW'X]) + cr 2 (n -k x - 2), 

2 

where tr is the matrix trace operator. Thus, a consistent estimator for <7 u is: 
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(25) tl„={RSS,-dl Amn (n-K- 2)]/(2;>,-»»t(X'WX)- 1 X'WW'X]), 

which could be negative (leading to a negative ICC estimate). 

2 2 * 

The estimators for <7 e and <J u in (24) and (25) can be inserted into (16) and (17) to obtain feasible GLS 

estimates. Note that for balanced designs, this approach yields (22). The ANOVA approach can be 
implemented using SAS (see Table 3.1). 



Maximum Likelihood Estimator 

ML methods simultaneously estimate ATE parameters and variance components, and are often used to 
estimate linear mixed models (such as HLM models) that are popular in the education field (see, for 
example, Raudenbush and Bryk 2002). ML estimators are consistent and asymptotically efficient, but do 
not take into account the loss in degrees of freedom due to the regression coefficients in estimating the 
variance components. 

To demonstrate the ML method, it is convenient to express 11; as c^A i , where 

(26) A. = I m[ + A3 m , 

where I m is the identity matrix, J m is an m i xm i matrix of Is, and A = CL / a] . Note that 
Aj 1 = I m —(m j + A ] ) 1 J m . Because of the normality assumption, the log likelihood is: 



M M 1 " 1 « 

(27) logi = -— log(2ff)-— log*?, 2 - MhoglA,!— - T [£(y l -X,o)'A;'(y,-X,a)], 

2 11 , = | l<J e (=1 

where | A. | denotes the determinant of A. . 

Taking derivatives in (27) with respect to the parameters and setting them equal to zero yields the 
following closed-form solutions for a and <j] (for a given A): 



(28) o„ LE = (£"_, X;A:'X, ) _, (X”,, x ;x;‘y, ), and 

(29) =2r[t(y, - X,a)A‘(y, - X,o)]. 

M t? 

Equation (28) is the feasible GLS estimator in (16). 

The first-order condition for A is a nonlinear equation that must be solved numerically: 



(30) 0 = grad =^r^ = --Z^[A 1 l J nll ] + ^[Z(yi -X 1 «)'[A;‘J m Aj'Ky, -X,«)]. 

oA 2 ;=1 2cr j=1 
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One common iterative method is the Newton-Raphson method where A '" 1 ' 1 1 11 is updated from A {,rei) as 
follows: 

(3 1) A (iter+1) = X (Uer+1) -H Hi,er) grad (iter) , 

where 



(32) W = = A;'J ] — h[^(y i .x.a)'A;‘J Aj'J Aj^y.-x^)] 

OAJOA 2 “7 <T 



1=1 



is the Hessian matrix. 5 Other iterative methods use - E(H ) = .5^ frfAj 1 J m Ai 1 J m ] in (3 1 ) rather than 
H (Fisher scoring) or the expectation-maximization (EM) algorithm (see Little and Rubin 2002). 

The model parameters can then be estimated using the following steps: (1) obtain an initial value for A 
(for example, using the ANOVA method), (2) calculate 0t MLE and cr eMLE using (28) and (29), (3) update 

A using (30)-(32), and (4) return to Step (2) until convergence is achieved. Final feasible GLS estimators 
can then be obtained using (16) and (17). Table 3.1 displays statistical package routines that use the ML 
method. 



Note that most statistical packages impose a non-negativity constraint for A at each iteration. Murray 
(1998) and Stroup and Littell (2002) demonstrate through simulations, however, that this constraint could 
deflate the Type 1 error rate and reduce statistical power. Thus, these authors recommend that options be 
used in statistical packages that allow for negative variance component estimates. A similar issue applies 
to the REML estimator discussed next. 



Restricted Maximum Likelihood Estimator 

Unlike the ML approach, the REML approach for the variance components adjusts for the degrees of 
freedom loss due to the estimation of the regression parameters (Patterson and Thompson 1971). The 
REML approach separates the likelihood into two independent parts, one of which depends only on the 
variance components (the part of interest). The approach profiles out the covariates by finding a linear 
combination of the outcomes, y” - Ly , whose distribution does not depend on a, where L is a 
( M — k)xM matrix and k is the ra nk of the co variate matrix X , 6 

To find L, consider first the OLS regression residuals r = Py , where P = (I M -X(X'X)' 1 X') is an 
MxM idempotent matrix. Note that because y is assumed to have a multivariate normal distribution, 



5 The formulas in (30) and (32) can be obtained using (26) and the matrix identities: 
d I A ! / dA =| a; 1 I tr( A;‘5A, / 8 A) and 8 A? / dA = -A; 1 (5A, / dA) A;’ . 

6 The REML approach does not depend on the specific choice of L , so one choice is derived. 
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r~N(0,PJ2P') , which is independent of a . Thus, a solution for L (which has (M — k) rows) can be 
obtained from P by satisfying the relation P = L'L subject to the normalizing condition I M k = LL/ . 
Such a solution can be found using the eigenvectors (E) and eigenvalues of P . Note that because P is 
idempotent, ( M — k ) eigenvalues will be one (say, the first ones) and the remaining k will be zero. 
Thus, L can be calculated as the first {M—k) rows of E' . 



Using this L, it follows that y : =Ly~N(0,LfiL') , whose distribution is independent of a . The 
REML log likelihood can be obtained using this distribution. Harville (1974) shows, however, that an 
equivalent log likelihood that shows more clearly the way that the regression parameters are profiled out 
of the likelihood can be expressed as follows: 



(33) log L = — log(2;r) - ' - log(rx; ) - ^ log | Aj | - ^ log | X.' A^X. 

z 2 2 i=l 



z i=\ 



+ ijr>g|x/x, 

4 i = i 



1 



2 <j. 



E(yi -Xi«GLs)'A: 1 (y i -X i «GLs)], 



2 

e i=l 



where a GLS is given in (16). This likelihood can be maximized with respect to cr 2 and A using the 

methods discussed above for the ML estimator (not shown). REML estimates of the variance components 
can then be used in (16) and (17) to obtain feasible GLS estimators. REML estimates are asymptotically 
equivalent to ML estimates, but the REML approach tends to produce larger standard errors due to the 
degrees of freedom adjustments. Table 3.1 displays statistical package routines that use the REML 
method. 



GEE Estimator 

The GEE estimator (discussed above) is also a feasible GLS estimator for the SP model. The ATE 
parameter estimate obtained from (7) yields the feasible GLS estimator in (16), and the model-based GEE 
variance estimator using (8) yields the feasible GLS variance estimator in (17). The GEE empirical 

sandwich variance estimator is I^IjIj 1 where 



I, =^x;n: , r i r i 'n: , x i . 

i = 1 

Under the GEE method, the variance components are estimated iteratively by updating regression 
residuals n . In Step (1), OLS residuals are used to obtain the following consistent estimates for cr u and 




(34) 

(35) 



& u, GEE ~ PgEE S GEE an< 3 °e,GEE ~ (^ PgEe) S GEE > w h crc 



2 _ 

S GEE ~ 



i n m-, i n m i m t 

Tj— t;Z2X md P 2 GEE = V 7 

M-k , =1 j=l -1)-2L f=1 j=1 tej 



Yik /s2 GE E )- 
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These estimates are then used to calculate a GLS in (16). In Steps (2) to (4), new residuals are calculated, 

(34) and (35) are updated, and new estimates of a GLS are obtained. Steps (2) to (4) are then continued 

until convergence is achieved. Table 3.1 displays statistical package routines that use this method that 
require the specification of an exchangeable working correlation matrix. 
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Chapter 6: Empirical Analysis 



This chapter presents ATE estimates and their standard errors using five published large-scale RCTs that 
were funded by the Institute of Education Sciences (IES) at the U.S. Department of Education (ED) and 
several foundations. These RCTs tested the effects of a wide range of education interventions, including 
mentoring programs for new teachers (Glazerman et al. 2008), early elementary school math curricula 
(Agodini et al. 2009), the use of selected computer software in the classroom (Dynarski et al. 2007), 
selected reading comprehension interventions (James-Burdumy et al. 2009), and Teach for America 
(Decker et al. 2004). Across the RCTs, random assignment was conducted at either the school or teacher 
(classroom) level primarily in low-performing school districts, and the key outcome measures were math 
or reading test scores of elementary school students. Appendix Table B.l provides information for each 
study. 

All studies (except for the Reading Comprehension study) report impact findings using a SP framework 
(using HLM models with baseline covariates), although it cannot be determined which specific estimation 
and optimization methods were used for the analyses. This chapter discusses findings from a re-analysis 
of the RCT data using the estimation methods considered above for the FP and SP models. The focus is 
on models that include baseline covariates. Using study documentation, the choice of baseline covariates 
(including blocking indicators), the construction of the outcome measures, and the treatment of missing 
data were as similar as possible to those used by the authors of the study reports. For comparable models, 
the impact results reported below are similar to those presented in the published reports. 

SAS was used to estimate the GEE, balanced design, REML, and ML models, because research has 
shown that the statistical packages considered in this paper yield similar estimates for common model 
specifications and optimization routines (West et al. 2007, Shah 1998). To keep the presentation 
manageable, the ML and REML estimates were obtained using the Newton-Raphson algorithm. The SAS 
code that was used to estimate the models is displayed in the footnotes to Table 6.2 below. The 
permutation tests were conducted using SAS programs written by the author, where permutation 
distributions were estimated from 10,000 reallocations of cluster means to the pseudo-treatment and 
control groups (because the number of possible allocations was too large to delineate for these studies). 
The ANOVA estimates were also obtained using SAS programs written by the author. 7 

In what follows, information is first presented for each study on cluster-level sample sizes and weights for 
the FP and SP models. This information is helpful for interpreting the impact findings, which are 
presented second. 



Weights for the Finite-Population and Super-Population Models 

As discussed, a key difference between the FP and SP models involves how clusters (schools or 
classrooms) are weighted in the analysis. In the FP models, clusters are either weighted by their sample 
sizes or equally, whereas in the SP model, clusters are weighted by the inverses of their variances. The 
extent to which ATE results differ across the weighting schemes will depend on the variability of cluster 
sample sizes, ICC values for the outcome variables, and the relationship between cluster-level impacts 
and cluster sample sizes. 



7 The Proc Panel procedure in SAS does not perform the SA ANOVA method that was discussed above, but 
uses variants of this procedure (which produce results consistent to those presented in this paper). 
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The top panel of Table 6. 1 shows that cluster sample sizes vary for all five studies, but more so for some 
studies than others. For example, the interquartile range of cluster sizes is about 7 students for the 
classroom-based Teach for America and Educational Technologies studies, but is 30 students for the 
school-based Reading Comprehension study. The finding that cluster sample sizes vary within each study 
suggests that cluster-level weights always differ for the two FP models. There are also differences across 
the studies in ICC values (Table 6.1). These intraclass correlations range from 0.06 to 0.12 for models 
that include baseline covariates and from 0.13 to 0.29 for models that exclude covariates. 

Finally, the variability of the weights for the SP models lies between the variability of the weights for the 
two FP models (Table 6.1). For instance, for the Math Curriculum study, the interquartile range for the SP 
weights for the REML model is 4 (bottom panel of Table 6.1), compared to 14 for the FP model where 
clusters are weighted by their sample sizes (top panel of Table 6.1), and 0 for the FP model where clusters 
are weighted equally. 



Impact Findings 

For all studies, the considered FP and SP estimators yield consistent findings concerning the statistical 
significance of the ATE estimates (Table 6.2). The estimators show that (1) elementary school students 
taught by Teach for America teachers performed significantly better on math achievement tests than those 
taught by traditional teachers, (2) the use of selected software products in the classroom did not improve 
first graders’ math test scores, (3) the offer of teacher induction programs for beginning teachers did not 
improve math test scores for second to sixth grade students, (4) the Saxon or Math Expressions math 
curriculum produced significantly higher fifth grade student math test scores than the other tested math 
curricula, and (5) the Reading for Knowledge reading curriculum produced significantly lower fifth grade 
student reading scores than the control (status quo) reading curriculum offered in the study schools. 

For each study, the ATE impact estimates vary by less than 0.02 or 0.03 standard deviations across the 
eight estimators (Table 6.2). For example, the impact estimates in effect size units range from 0.261 to 
0.273 for the Math Curriculum study, from 0. 126 to 0. 129 for the Teach for America study, and from 
-0.147 to -0.159 for the Reading Comprehension study. 

The estimated standard errors (and /(-values), however, range somewhat more across the eight estimators 
than the ATE point estimates (Table 6.2). For example, standard errors range from 0.038 to 0.075 for the 
Reading Comprehension study, 0.035 to 0.050 for the Teacher Induction study, and 0.478 to 0.766 for the 
Educational Technologies study. The finding that the various consistent estimators yield more variable 
estimates of standard errors than regression coefficients is a pattern that has often been found in the 
literature for observational studies. 

Findings for the SP Estimators. On the basis of the empirical findings and the theory from above, the SP 
estimators can be divided into two main groups. The first group includes the ANOVA and REML 
estimators that both account for the loss in degrees of freedom in the variance estimates due to the 
regression parameters. Across the five studies, these two estimators yield identical ATE impact estimates, 
and standard errors that differ by at most .003 standard deviations (Table 6.2). The similarity of the 
ANOVA and REML findings is consistent with Baltagi and Chang (1994), who found using simulations 
that the ANOVA method performs well for random effects models. Thus, there is reason for education 
researchers to consider using the ANOVA estimator more often in RCTs. 

The second group of SP estimators includes the model-based and empirical sandwich GEE estimators and 
the ML estimator. Across the five studies, these three estimators yield ATE impact estimates that differ 
from each other by less than .002 standard deviations, and standard errors that typically differ from each 
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Table 6.1: Information on Weighting Schemes for the FP and SP Models, by Study 




Statistic 


Teach for 
America 


Educational 

Technologies 


Teacher 

Induction 


Math 

Curriculum 


Reading 

Comprehension 


Distribution of Cluster Sizes 
(Percentiles) 

10th 


11 


10 


9 


21 


28 


25th 


14 


13 


13 


27 


39 


50 th 


17 


16 


20 


31 


57 


75th 


21 


19 


29 


41 


69 


90th 


23 


22 


47 


49 


99 


ICCs for the SP REML Model 

No covariates 


0.29 


0.20 


0.14 


0.19 


0.13 


Covariates 


0.08 


0.12 


0.07 


0.06 


0.06 


Distribution of Cluster-Level 
Weights for the SP REML 
Model With Covariates 
(Percentiles) 8 
10th 


14 


12 


17 


29 


49 


25th 


16 


13 


21 


32 


54 


50 th 


17 


14 


26 


33 


60 


75th 


19 


15 


29 


36 


62 


90th 


20 


16 


33 


38 


66 


Sample Sizes 

Clusters ( p=% treatment ) 


95 (0.44) 


137 (0.57) 


173 (0.52) 


39 (0.46) 


39 (0.46) 


Students 


1,630 


2,176 


4,381 


1,309 


2,256 



Source: Data from studies listed in Appendix Table B. 1 . 
“The weights sum to the total student sample size. 
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Table 6.2: Regression-Adjusted Impact Results, by Study 


Model and 
Estimator 


Teach for 
America 


Educational 

Technologies 


Teacher 

Induction 


Math 

Curriculum 


Reading 

Comprehension 


Finite Population Model 


1. GEE (Empirical) 

a. Clusters Weighted 
by Sample Sizes 


.126 (.048) 
(.008)* 


.032 (.046) 
(.478) 


-.022 (.037) 
(.548) 


.261 (.059) 
(.000)* 


-.147 (.052) 
(.005)* 


b. Clusters Weighted 
Equally 

Permutation Tests 


.126 (.047) 
(.007)* 
(.005)* 


.014 (.046) 
(.766) 
(.738) 


.013 (.045) 
(.782) 
(.737) 


.273 (.061) 
(.000)* 
(.000)* 


-.159 (.059) 
(.007)* 
(.000)* 


Super-Population Model 


2. Balanced Design 


.126 (.055) 
(.025)* 


.014 (.044) 
(.759) 


.013 (.050) 
(.802) 


.273 (.068) 
(.000)* 


-.159 (.075) 
(.051) 


3. ANOVA 


.129 (.055) 
(.023)* 


.019 (.045) 
(.663) 


-.001 (.043) 
(.976) 


.269 (.066) 
(.000)* 


-.159 (.069) 
(.038)* 


4. ML 


.128 (.048) 
(.007)* 


.020 (.042) 
(.637) 


-.005 (.036) 
(.888) 


.268 (.057) 
(.000)* 


-.153 (.039) 
(.000)* 


5. REML 


.129 (.055) 
(.020)* 


.019 (.044) 
(.661) 


-.001 (.043) 
(.981) 


.269 (.067) 
(.000)* 


-.159 (.072) 
(.027)* 


6. GEE 

a. Model-Based 


.128 (.049) 
(.008)* 


.020 (.043) 
(.648) 


-.006 (.035) 
(.859) 


.268 (.055) 
(.000)* 


-.151 (.038) 
(.000)* 


b. Empirical 


.128 (.047) 
(.007)* 


.020 (.045) 
(.661) 


-.006 (.040) 
(.874) 


.268 (.060) 
(.000)* 


-.151 (.053) 
(.004)* 



Source: Data from studies listed in Appendix Table B.l. See Table 6.1 for sample sizes. 

Notes: From left to right, the figures in cells are the ATE impact estimates, estimated standard errors, /7-values, 

and p-values for the permutation tests for Model lb. Impact estimates are regression-adjusted using the 
covariates indicated in Appendix Table 1. 

SAS routines were used to estimate the models except for the ANOVA and permutation tests which were 
performed using SAS programs written by the author. Let CLUS denote the cluster codes, T the treatment 
dummy, Y the outcome, YC the cluster-level mean outcome, X the list of covariates centered at their 
cluster-level means, XC the cluster-level mean covariates, and D the input dataset. The following code was 
then used to estimate the models: 

Models la and lb: proc genmod data=D; class CLUS; model Y=T X XC / dist=normal; 

repeated subject = CLUS / type = ind; (A weight statement was used for Model lb to weight clusters equally) 

Model 2: proc reg data=D; model YC = T XC; 

Model 4: proc mixed data=D method=ml; class CLUS; model Y=T X XC/solution; random CLUS; 

Model 5: proc mixed data=D; class CLUS; model Y=T X XC/solution; random CLUS; 

Models 6a and 6b: proc genmod data=D; class CLUS; model Y=T X XC/ dist=normal; 
repeated subject = CLUS / type = exch models; 

*The ATE impact estimate is significantly different from zero at the 0.05 level, two-tailed test. 
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other by less than .005 standard deviations (Table 6.2). The similarity of estimates for the two GEE 
estimators suggests that the exchangeable error structure is appropriate for the data. The GEE and ML 
methods produce smaller standard errors than the REML and ANOVA methods (Table 6.2). This finding 
is expected for the ML method, which does not adjust for the degrees of freedom loss due to the 
estimation of the regression parameters. 

Finally, the simple balanced design method produces impact and standard error estimates that are 
consistent with those from the other SP methods, even though this estimator does not account for 
unbalanced cluster sizes (Table 6.2). Thus, there is good reason to use this simple between-cluster 
estimator to check the robustness of study findings obtained using the other more complex methods. 

Findings for the FP Estimators. Empirical results for the two FP models are displayed in the top panels 
of Table 6.2 and labeled as “Model la” and “Model lb.” Differences in the ATE impact estimates for 
these two FP models range from 0 to .035 standard deviations across the studies, because of differences in 
weighting schemes. The differences are most pronounced for the Educational Technologies and Teacher 
Induction studies where the estimated impacts are not statistically significant. 

The ATE point estimates for the FP and SP models typically differ by less .005 standard deviations for 
the three studies with statistically significant impact estimates (the Teach for America, Math Curriculum, 
and Reading Comprehension studies; Table 6.2). Furthermore, across all five studies, the standard error 
estimates for the FP models are similar to each other and to those for the empirical sandwich GEE 
estimator for the SP model (Table 6.2); the pairwise differences in standard errors are all less than .007 
standard deviations. However, as discussed, the standard error estimates for the FP models are 
consen’ative, because they ignore precision gains from the difficult-to-estimate S r 2 terms in (6) and (12). 

Finally, for FP Model lb, the permutation and parametric hypothesis tests yield similar /> values (Table 
6.2). For example, the respective ^-values are .005 and .008 for the Teach for America study, .766 and 
.738 for the Educational Technologies study, and .000 and .007 for the Reading Comprehension study. 
Thus, the normality assumption underlying the parametric tests appears to be validated using the 
nonparametric methods, which are much more computationally burdensome. 
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Chapter 7: Summary and Conclusions 



This paper has examined the estimation of two-stage clustered RCT designs in education research using 
the Neyman causal inference framework that underlies experiments. The key distinction between the 
considered causal models is whether potential treatment and control group outcomes are considered to be 
fixed for the study population (the FP model) or randomly selected from a vaguely-defined super- 
population (the SP model). 

In the FP model, the only source of randomness is treatment status, and a clustered design results only 
because students in the same cluster share the same treatment status. The relevant impact parameter for 
this model is the average treatment effect for those in the study sample; thus, the impact results are 
internally valid only. The asymptotic variance for the FP model (that was derived in this paper) can be 
estimated using a GEE estimator assuming an independent working correlation structure. Two weighting 
options for this model are (1) to weight each student equally (the OLS approach) or (2) to weight each 
cluster equally (to estimate ATEs for the average cluster in the sample). The FP variance estimators are 
likely to be conservative, however, because they ignore precision gains from difficult-to-estimate variance 
terms that represent the extent to which treatment effects vary and co-vary across students in the same 
cluster. Thus, in theory, the FP estimators could yield more precise ATE estimates than the SP estimators, 
but it is difficult to realize these precision gains in practice. 

In the SP model, cluster- and student-level potential outcomes are considered to be randomly sampled 
from respective super-population distributions. In this framework, the relevant ATE parameter is the 
intervention effect for the average cluster in the super-population. Thus, impact findings are assumed to 
generalize outside the study sample, although it is often difficult to precisely define the study universe. 

For estimating the SP model, the paper discussed key features of several feasible GLS estimators (ML, 
REML, ANOVA, and GEE estimators) assuming an exchangeable random effects error structure. For 
these estimators, clusters are weighted by the inverses of their variances, and the variability of these 
weights lies between the variability of the weights under the two FP weighting schemes. 

FT sing data from five recent large-scale clustered RCTs in the education area, the empirical analysis 
estimated ATEs and their standard errors using the considered estimators. For all five studies, the 
considered estimators yield consistent findings concerning statistical significance. Flowever, although the 
estimated impacts are similar across the estimators, the standard errors (and hence, p-values) differ more 
across the estimators. This suggests that in particular studies, policy conclusions could differ using the 
various estimators. 

The choice of the primary estimation method and cluster-level weighting scheme should best fit 
evaluation research questions and objectives, and should be specified and justified in the analysis 
protocols. Flowever, there might not always be a scientific basis for making these benchmark choices 
(that is, there might not be a “true” underlying statistical model for the study). Thus, a key 
recommendation from this paper is that education researchers consider testing the sensitivity of their 
benchmark impact findings using alternative estimation methods, rather than relying solely on the 
methods with which they are most comfortable. These sensitivity analyses could be important for ruling 
out the possibility that the impact findings are driven by specific distributional assumptions about the data 
and asymptotic results. Furthermore, it is recommended that findings from sensitivity analyses be 
reported in study appendixes, that attempts be made to explain discrepancies between sensitivity and 
benchmark analysis findings, and that the robustness of results be reflected in the study conclusions. 

Researchers currently most often report impact findings using the SP framework based on REML or ML 
methods. Results in this paper suggest that, in the sensitivity analysis, impact estimates could also be 
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estimated using other methods such as the balanced-design, GEE, and FP estimators. The ANOVA 
method is another approach that could be used more often in education research. 

Finally, the choice of whether to adopt the FP or the SP framework is a difficult philosophical issue. In 
practice, the two methods will tend to blur, however, because standard estimation procedures do not 
account for precision gains from the FP model, and the empirical results presented in this paper suggest 
that the FP and SP models yield similar impact findings. Furthermore, the two approaches blur under 
balanced designs. Nonetheless, researchers should understand the assumptions underlying the SP and FP 
approaches and their implications for generalizing and inteipreting the impact findings. 
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Appendix A: Proofs 



Proof of Lemma 1 

Applying standard OLS methods to (3) yields SR = y T - y c . To calculate the asymptotic moments of 
we express jd x as follows: 



(A.1) /), .» 






n ™i 

where d — ZZ^ - p) 1 . Substituting for 77 .. using (3) yields: 

i=i y=i 



Y J Y J [a ij (T i -p) + T lJ (T i -p) 2 ] £ Z K + (1 - 2 PK Ji 
(A.2) ^ SR -p x ) = d=U^ 



J 



d 



ZZv 

1=1 y=i 

d 






/? m, 

Note that E{d) — nmp(\ — p) . Thus, ( fi lSR —J3 X ) — >• Z Z A p / nmp(l-p) - 0 because ^ ^ l.. = 0. 

1=1 j = i i j 

Thus, y9j v/; is asymptotically unbiased. 

Using (A.2), the variance of /?, SR is: 



n ntj n m, m, ^ n n mp 

Var{p lSR ) = 

where the last equality holds because Uar(71) = p(l — p) and Cov(T i , T t ) = —p{\ — p)/(n — 1) . Because 
ZZ I.. = 0 , it follows that ^ l..) 2 = 0 . Hence, 

i i i j 



VarCZZW P( h 

(=1 y=l 



■pxZZZV* 

1=1 7=1 A'=l 



7 — jrZZZZVi',) 

V W A) / = 1 ! y* = l j' = \ 



(A.3) Var(fi lSR ) 



n 



n - 1 



n m, m, 

mi-^ZZZVia 



Z=1 7=1 A:=l 

~d 2 



-[(1 - P ) 2 5 r 2 + /r S 2 + 2p(l - p)S 2 c ], 

nmp(l- p) 
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n nij 

where S RC is the asymptote of [1 / /7/»]X, X, (Y Tij — Y T )( Y Ci/ — Y c ) , the covariance between the treatment 

<= i 7=1 

and control potential outcomes for students within the same schools. A more intuitive variance expression 
is obtained by writing S 2 as S 2 = S 7 2 + .S) 2 -2S RC . Solving for S 7 2 c and substituting into (A.3) yields 
the variance expression in (7). 

The asymptotic normality of /}, SR follows by expressing (A.l) as 

yfnm p(\ — p)(fi\ SR — ) = X X (7j — /7)// (/ / \[mn and using a central limit theorem for finite 

' 7 

populations (see for example, Freedman 2008, Flogland 1978, and Hajek 1960). 



Proof of Lemma 2 

The multiple regression estimator for J3 { is as follows: 

(A. 4) A, M r =[t'(I-P Q )tr 1 T / (I-P Q )Y, 

where T is an Mx 1 vector containing T—p terms for the full sample, I is the MxM identity matrix, 
p q =Q(QQ) 1 Q' is the projection matrix where Q is an Mxq matrix of covariates (that are centered 
around the grand means), and Y is an Mx 1 vector of student outcomes. 

If we substitute for Y in (A.4) using the true model in (3), then ft 1 MR can be expressed as follows: 



(A.5) P XMR = 



-4t'(I-P q )T 

nm 



-i-i 



-4t / (I-Pq)[K&+TA+!1] 

nm 



~ P\ + 



1 

iH 

3 

Cm 

l 

i 


-1 


f n f'P Q il 


nm 




nm nm 



where K is a column of Is and ij is a vector of error terms in (3). This estimator is biased in finite 
samples. However, we show that the bias tends to zero as n approaches infinity by examining the limiting 
values of each bracketed term: 



t'(I-P Q )f 

nm 



%(!-/>)’ 



rn 

nm 



n mi n m t 

X X (7 ' - p )a X X (7: - pf r u 



i = | 7=1 

nm 



+ 



1=1 7=1 



nm 



■ £ — > 0 + p{\ - 7?)(0) = 0, 



so that T and I] are asymptotically uncorrelated, and: 



2 



Summary and Conclusions 





n nij 



f P „ EKr-p)/, YLu-pfh 

L *Q" _ i = 1 7=1 i = 1 7=1 



- + - 



-> 0 , 



nm 



nm 



nm 



where — - — >■ denotes convergence in probability. Thus, /?, w; is a consistent estimator. 



To calculate the asymptotic variance of f 3 x MR , we apply an asymptotic expansion to (A. 5 ): 



(A. 6) — 



Til 



T'P Q a 



nmp( 1 - /?) nmp( \ - />) 



+ oll/«), 



where o^fl/nj signifies terms of order 1 In. Note that the first term on the right-hand side of (A. 6) pertains 
to the regression estimator without covariates. Note also that for the second term, T'P Q a = T P () « . 

Thus, (A.6) can be expressed as follows: 



(A-7) Am- A = — ^ -EEb +(1 -2p)r„ -q„(Q'Q) J Q'a] T, +o,(l In), 

mnp(l p) , =1 j=\ 

where is a row vector of covariates for student i. 



The term inside the brackets in (A. 7 ) sums to zero because IE a ij = IE Ty = 0 , and 

i ] > j 

'Y y qij(Q'Q) Q a = ^ ^ a tj = 0 because it is the sum of fitted values when a is regressed onQ . 

i j i j 

Thus, if we define A as the bracketed term in (A.6), then II l.. = 0 , and we can use the same 

‘ j 

methods as for the regression estimator in Lemma 1 to derive the asymptotic variance of in ( 10 ). The 
asymptotic normality of f \ follows from (A.6) because both T'q / nm and T'P Q a/ yjnm are 
asymptotically normal. 
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Appendix B: Summary of Data Sources 



Table B.l: Summary of Data Sources 



Study 




Original and 








(Authors; 


Description of 


Current Study 


Level of 


Outcome for 


Baseline 


Sponsor ) 3 


Study 


Populations 


Clustering 


Current Study 


Covariates 


Teach for 


Study examined 


1st to 5th graders in 


Teacher 


Iowa Test of Basic 


Baseline test 


America 


the impact of 


the 2001-2002 




Skills (ITBS) math 


scores in 


Evaluation 


teachers from 


school year; 17 




score 


reading and 


(Decker et al. 


Teach for America, 


schools in 






math; grade 


2004; SRF; HF, 


a highly selective 


Baltimore, Chicago, 






level indicators; 


CC) 


alternative 


Los Angeles, 






school 




certification 
program, on the 
academic 
achievement of 


Mississippi Delta, 
and New Orleans. 

Current study 






indicators 




elementary school 
students. Students 
were randomly 
assigned to 
classrooms taught 
by Teach for 
America teachers 
or traditional 
teachers in the 
same grade and 
school. 


focuses on 1st 
graders. 








Evaluation of 


Study examined 


Students in 1st 


Teacher 


1st grade Stanford-9 


Baseline test 


Education 


the effects of 16 


grade, 4th grade, 




reading NCE score 


scores; student's 


Technologies 


software products 


6th grade, and 






age and gender; 


(Dynarski et al. 


on students' 


algebra classes in 






teacher's 


2007; IES) 


academic 


the 2004-05 school 






gender. 




achievement in 1st 


year in 33 districts. 






experience, and 




grade reading, 4th 
grade reading, 6th 
grade math, and 


Current study 
focuses on 1st 
graders. 






highest degree; 

school's 

racial/ethnic 




algebra in 33 






composition; 




school districts. 








percent of 




Within each 








school's 




participating 








students eligible 




school, teachers 








for special 




were randomly 








education and 




assigned to use a 








subsidized 




study product or 
not. For the 
purposes of our 
report, outcomes in 
1 st and 4th grades 
are used. 








lunch 
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TABLE B. 1 ( continued ) 



Table B.l: Summary of Data Sources 



Study 




Original and 








(Authors; 


Description of 


Current Study 


Level of 


Outcome for 


Baseline 


Sponsor) 3 


Study 


Populations 


Clustering 


Current Study 


Covariates 


Evaluation of 


Study examined 


Beginning teachers 


School 


District-specific 


Student pretest 


Comprehensive 


the effects of 


in elementary 




administered test 


Z-scores, 


Teacher 


comprehensive 


schools within 17 




scores (Z-scores) 


gender. 


Induction 


teacher induction 


low-income school 






race/ethnicity, 


Programs 


programs on 


districts across 13 






free/reduced 


(Glazerman et 


teacher retention. 


states in the 2005- 






price lunch 


al. 2008; IES) 


teachers' classroom 


06 school year. 






status, special 




practices, and 
student outcomes. 
The programs 


Current study 
focuses on 2nd to 
6th graders 






education 
status, grade 
level; teacher's 




provided beginning 






age, gender, 




teachers with an 








race/ethnicity, 




orientation. 








teaching and 




mentoring sessions, 








non-teaching 




and professional 








experience, 




development. 








certification 




Random 








status, 




assignment of 








preparation 




elementary schools 








type, 




took place within 








educational 




17 participating 
districts. 








attainment 


Achievement 


Study examined 


First graders in 39 


School 


ECLS-K total math 


ECLS-K pretest 


Effects of Four 


the relative impacts 


Title I schools in 




assessment scale 


score and seven 


Early 


of four math 


four districts in four 




score in five math 


strata (block) 


Elementary 


curricula on first- 


states for both the 




content areas 


indicator 


School Math 


grade mathematics 


original and current 






variables. 


Curricula: 


achievement. The 


study. For the 








Findings from 


curricula were 


current study, the 








First Graders in 


selected to 


treatment group was 








39 Schools 


represent diverse 


defined as those in 








(Agodini et al. 


approaches to 


schools receiving 








2009; IES) 


teaching 


the Saxon and Math 










elementary school 


Expressions 










math in the United 


curricula, and the 










States. The four 


control group was 










curricula are 


defined as those 










Investigations in 


receiving the 










Number, Data, and 


remaining two 










Space; Math 


curricula. 









Expressions; Saxon 
Math; and Scott 
F oresman-Addison 
Wesley 
Mathematics. 
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Table B.l: Summary of Data Sources 


Study 




Original and 








(Authors; 


Description of 


Current Study 


Level of 


Outcome for 


Baseline 


Sponsor) 3 


Study 


Populations 


Clustering 


Current Study 


Covariates 


Effectiveness of 


Study examined 


5 th grade students 


School 


Composite Z-Score 


Indicators of 


Selected 


the impacts of four 


in the 2006-2007 




from the Passage 


school 


Supplemental 


reading 


school year in 89 




Comprehension 


urban/rural 


Reading 


comprehension 


schools in 10 




Subtest of the Group 


status; teacher 


Comprehension 


curricula for a first 


districts for both the 




Reading Assessment 


race/ethnicity 


Interventions 


cohort of fifth 


original and current 




and Diagnostic 


indicators; 


(James- 


graders. The 


study. For the 




Evaluation 


district 


Burdumy et al. 


curricula were 


current study, the 




(GRADE) and the 


indicators; 


2009; IES) 


Project CRISS, 


treatment group was 




Science and Social 


student pretest 




ReadAbout, Read 


defined as those in 




Studies (SS) 


scores on the 




for Real, and 


schools offering the 




Reading 


GRADE and SS 




Reading for 


Reading for 




Comprehension 


tests; Student 




Knowledge and 


Knowledge 




Assessments 


race/ethnicity 




were selected 


curriculum, and the 






indicators; 




based on public 


control group 






missing value 




submissions and 


includes those in 






indicators 




ratings by an expert 


schools that were 










review panel. 


assigned to the 










Schools were 
randomly assigned 
to one of the four 
intervention groups 
or to a control 
group. 


study control group. 









d Acronyms are defined as follows: IES = Institute of Education Sciences at the U.S. Department of Education; SRF = Smith 
Richardson Foundation; HF= Hewlett Foundation; CC=Camegie Corporation. 
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